Informative Data Projections: A Framework and Two Examples

نویسندگان

  • Tijl De Bie
  • Jefrey Lijffijt
  • Raúl Santos-Rodriguez
  • Bo Kang
چکیده

Projection Pursuit aims to facilitate visual exploration of high-dimensional data by identifying interesting low-dimensional projections. A major challenge in Projection Pursuit is the design of a projection index—a suitable quality measure to maximise. We introduce a strategy for tackling this problem based on quantifying the amount of information a projection conveys, given a user’s prior beliefs about the data. The resulting projection index is a subjective quantity, explicitly dependent on the intended user. As an illustration, we developed this principle for two kinds of prior beliefs; the first leads to PCA, the second leads to a novel projection index, which we call t-PCA, that can be regarded as a robust PCA-variant. We demonstrate t-PCA’s usefulness in comparative experiments against PCA and FastICA, a popular PP method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stein’s Identity,Fisher Information, and Projection Pur- suit: A Triangulation

Two separate structure discovery properties of Fisher’s LDF are derived in a mixture multivariate normal setting. One of the properties is related to Fisher information and is proved by using Stein’s identity. The other property is on lack of unimodality. The properties are used to give three selection rules for choice of informative projections of high dimensional data, not necessarily multiva...

متن کامل

Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU

Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...

متن کامل

The Grassmannian Atlas: A General Framework for Exploring Linear Projections of High-Dimensional Data

Linear projections are one of the most common approaches to visualize high-dimensional data. Since the space of possible projections is large, existing systems usually select a small set of interesting projections by ranking a large set of candidate projections based on a chosen quality measure. However, while highly ranked projections can be informative, some lower ranked ones could offer impo...

متن کامل

Active Learning for Informative Projection Retrieval

We introduce an active learning framework designed to train classification models which use informative projections. Our approach works with the obtained lowdimensional models in finding unlabeled data for annotation by experts. The advantage of our approach is that the labeling effort is expended mainly on samples which benefit models from the considered hypothesis class. This results in an im...

متن کامل

Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.08762  شماره 

صفحات  -

تاریخ انتشار 2015